support service discovery with JNA #9705

raphaelgavache · 2025-10-08T20:15:06Z

Add support for service discovery using JNA, a second version using JNI will be replacing this approach when available

Can be disabled with
env var: DD_TRACE_SERVICE_DISCOVERY_ENABLED=true
system arg: dd.trace.service.discovery.enabled=true

Test instructions

On system-tests

To test on system-tests

go to the matching system-test branch https://github.com/DataDog/system-tests/pull/5502/files
download from gitlab dd-trace-api-1.55.0-SNAPSHOT.jar and dd-java-agent-1.55.0-SNAPSHOT.jar
add both jars in system-tests-binaries

./build.sh -i runner
 source venv/bin/activate
./build.sh java
./run.sh PARAMETRIC -L java tests/parametric/test_process_discovery.py::Test_ProcessDiscovery

On a linux VM

# install injector apm
DD_SITE="datadoghq.com" DD_APM_INSTRUMENTATION_ENABLED=host DD_API_KEY=x DD_APM_INSTRUMENTATION_LIBRARIES=java:1 bash -c "$(curl -L https://install.datadoghq.com/scripts/install_script_agent7.sh)"

# install java tracer commit
sudo datadog-installer install "oci://installtesting.datad0g.com/apm-library-java-package:34aa766f77fb84bd5c43ad4daaea3a149d339683"

Check file descriptors

$ java Sleep &
[1] 1346653
$ cat /proc/1346653/fd/11
schema_versiontracer_languagejavatracer_version1.55.0-SNAPSHOT~34aa766f7hostnameraphael-debian12
runtime_id$b27e9ae7-f5eb-4f84-b298-3d2c6c6cb414
                                               service_nameSleep
                                                                process_tags\entrypoint.name:sleep,entrypoint.type:class,entrypoint.workdir:raphael_gavach

Sleep is a sleep 50s java class for test purpose

Motivation

Additional Notes

Contributor Checklist

Format the title according the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any useful labels
Don't use close, fix or any linking keywords when referencing an issue.
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, move, or deletion
Update the public documentation in case of new configuration flag or behavior

Jira ticket: [PROJ-IDENT]

datadog-datadog-prod-us1 · 2025-10-08T20:36:06Z

🎯 Code Coverage
• Patch Coverage: 37.96%
• Total Coverage: 59.59% (-0.22%)

View detailed report

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: fa05b36 | Docs | Was this helpful? Give us feedback!}

pr-commenter · 2025-10-08T20:49:08Z

Benchmarks

Startup

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	raphael/memfd
git_commit_date	1760635008	1760636118
git_commit_sha	`c85d09f`	`fa05b36`
release_version	1.55.0-SNAPSHOT~c85d09f004	1.55.0-SNAPSHOT~fa05b36659

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1760637950	1760637950
ci_job_id	1183514612	1183514612
ci_pipeline_id	79539409	79539409
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-j2ozgmme 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-j2ozgmme 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
module	Agent	Agent
parent	None	None

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 56 metrics, 9 unstable metrics.

Startup time reports for petclinic

gantt
    title petclinic - global startup overhead: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.019 s) : 0, 1018865
Total [baseline] (10.685 s) : 0, 10685259
Agent [candidate] (1.025 s) : 0, 1024999
Total [candidate] (10.831 s) : 0, 10831201
section appsec
Agent [baseline] (1.195 s) : 0, 1194750
Total [baseline] (11.097 s) : 0, 11097424
Agent [candidate] (1.202 s) : 0, 1201617
Total [candidate] (10.839 s) : 0, 10839238
section iast
Agent [baseline] (1.158 s) : 0, 1157750
Total [baseline] (11.082 s) : 0, 11081535
Agent [candidate] (1.15 s) : 0, 1150424
Total [candidate] (11.125 s) : 0, 11124772
section profiling
Agent [baseline] (1.163 s) : 0, 1163190
Total [baseline] (11.023 s) : 0, 11023431
Agent [candidate] (1.172 s) : 0, 1171839
Total [candidate] (10.941 s) : 0, 10940510

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.019 s	-
Agent	appsec	1.195 s	175.885 ms (17.3%)
Agent	iast	1.158 s	138.885 ms (13.6%)
Agent	profiling	1.163 s	144.325 ms (14.2%)
Total	tracing	10.685 s	-
Total	appsec	11.097 s	412.165 ms (3.9%)
Total	iast	11.082 s	396.276 ms (3.7%)
Total	profiling	11.023 s	338.172 ms (3.2%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.025 s	-
Agent	appsec	1.202 s	176.617 ms (17.2%)
Agent	iast	1.15 s	125.425 ms (12.2%)
Agent	profiling	1.172 s	146.84 ms (14.3%)
Total	tracing	10.831 s	-
Total	appsec	10.839 s	8.037 ms (0.1%)
Total	iast	11.125 s	293.57 ms (2.7%)
Total	profiling	10.941 s	109.309 ms (1.0%)

gantt
    title petclinic - break down per module: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.471 ms) : 0, 1471
crashtracking [candidate] (1.483 ms) : 0, 1483
BytebuddyAgent [baseline] (694.199 ms) : 0, 694199
BytebuddyAgent [candidate] (697.693 ms) : 0, 697693
GlobalTracer [baseline] (242.065 ms) : 0, 242065
GlobalTracer [candidate] (243.979 ms) : 0, 243979
AppSec [baseline] (32.424 ms) : 0, 32424
AppSec [candidate] (32.831 ms) : 0, 32831
Debugger [baseline] (6.45 ms) : 0, 6450
Debugger [candidate] (6.442 ms) : 0, 6442
Remote Config [baseline] (686.49 µs) : 0, 686
Remote Config [candidate] (679.145 µs) : 0, 679
Telemetry [baseline] (9.295 ms) : 0, 9295
Telemetry [candidate] (9.546 ms) : 0, 9546
Flare Poller [baseline] (11.188 ms) : 0, 11188
Flare Poller [candidate] (11.103 ms) : 0, 11103
section appsec
crashtracking [baseline] (1.463 ms) : 0, 1463
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (718.095 ms) : 0, 718095
BytebuddyAgent [candidate] (722.883 ms) : 0, 722883
GlobalTracer [baseline] (234.267 ms) : 0, 234267
GlobalTracer [candidate] (236.465 ms) : 0, 236465
IAST [baseline] (24.798 ms) : 0, 24798
IAST [candidate] (24.999 ms) : 0, 24999
AppSec [baseline] (174.891 ms) : 0, 174891
AppSec [candidate] (175.459 ms) : 0, 175459
Debugger [baseline] (6.192 ms) : 0, 6192
Debugger [candidate] (6.053 ms) : 0, 6053
Remote Config [baseline] (636.521 µs) : 0, 637
Remote Config [candidate] (620.477 µs) : 0, 620
Telemetry [baseline] (8.554 ms) : 0, 8554
Telemetry [candidate] (8.527 ms) : 0, 8527
Flare Poller [baseline] (4.805 ms) : 0, 4805
Flare Poller [candidate] (3.935 ms) : 0, 3935
section iast
crashtracking [baseline] (1.461 ms) : 0, 1461
crashtracking [candidate] (1.458 ms) : 0, 1458
BytebuddyAgent [baseline] (817.928 ms) : 0, 817928
BytebuddyAgent [candidate] (814.404 ms) : 0, 814404
GlobalTracer [baseline] (234.146 ms) : 0, 234146
GlobalTracer [candidate] (232.077 ms) : 0, 232077
IAST [baseline] (27.088 ms) : 0, 27088
IAST [candidate] (26.498 ms) : 0, 26498
AppSec [baseline] (35.59 ms) : 0, 35590
AppSec [candidate] (34.966 ms) : 0, 34966
Debugger [baseline] (6.24 ms) : 0, 6240
Debugger [candidate] (6.167 ms) : 0, 6167
Remote Config [baseline] (627.199 µs) : 0, 627
Remote Config [candidate] (595.415 µs) : 0, 595
Telemetry [baseline] (8.839 ms) : 0, 8839
Telemetry [candidate] (8.574 ms) : 0, 8574
Flare Poller [baseline] (4.339 ms) : 0, 4339
Flare Poller [candidate] (4.246 ms) : 0, 4246
section profiling
crashtracking [baseline] (1.429 ms) : 0, 1429
crashtracking [candidate] (1.446 ms) : 0, 1446
BytebuddyAgent [baseline] (722.465 ms) : 0, 722465
BytebuddyAgent [candidate] (727.042 ms) : 0, 727042
GlobalTracer [baseline] (217.674 ms) : 0, 217674
GlobalTracer [candidate] (220.078 ms) : 0, 220078
AppSec [baseline] (32.385 ms) : 0, 32385
AppSec [candidate] (32.711 ms) : 0, 32711
Debugger [baseline] (6.551 ms) : 0, 6551
Debugger [candidate] (8.414 ms) : 0, 8414
Remote Config [baseline] (767.579 µs) : 0, 768
Remote Config [candidate] (720.467 µs) : 0, 720
Telemetry [baseline] (15.198 ms) : 0, 15198
Telemetry [candidate] (14.617 ms) : 0, 14617
Flare Poller [baseline] (4.866 ms) : 0, 4866
Flare Poller [candidate] (4.194 ms) : 0, 4194
ProfilingAgent [baseline] (108.991 ms) : 0, 108991
ProfilingAgent [candidate] (109.681 ms) : 0, 109681
Profiling [baseline] (109.743 ms) : 0, 109743
Profiling [candidate] (110.294 ms) : 0, 110294

Startup time reports for insecure-bank

gantt
    title insecure-bank - global startup overhead: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
Agent [baseline] (1.024 s) : 0, 1024233
Total [baseline] (8.667 s) : 0, 8666831
Agent [candidate] (1.027 s) : 0, 1026688
Total [candidate] (8.705 s) : 0, 8704840
section iast
Agent [baseline] (1.152 s) : 0, 1152209
Total [baseline] (9.296 s) : 0, 9296380
Agent [candidate] (1.154 s) : 0, 1154356
Total [candidate] (9.304 s) : 0, 9304436

baseline results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.024 s	-
Agent	iast	1.152 s	127.976 ms (12.5%)
Total	tracing	8.667 s	-
Total	iast	9.296 s	629.549 ms (7.3%)

candidate results

Module	Variant	Duration	Δ tracing
Agent	tracing	1.027 s	-
Agent	iast	1.154 s	127.668 ms (12.4%)
Total	tracing	8.705 s	-
Total	iast	9.304 s	599.596 ms (6.9%)

gantt
    title insecure-bank - break down per module: candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004

    dateFormat X
    axisFormat %s
section tracing
crashtracking [baseline] (1.479 ms) : 0, 1479
crashtracking [candidate] (1.478 ms) : 0, 1478
BytebuddyAgent [baseline] (697.618 ms) : 0, 697618
BytebuddyAgent [candidate] (698.59 ms) : 0, 698590
GlobalTracer [baseline] (242.764 ms) : 0, 242764
GlobalTracer [candidate] (244.791 ms) : 0, 244791
AppSec [baseline] (32.832 ms) : 0, 32832
AppSec [candidate] (32.915 ms) : 0, 32915
Debugger [baseline] (6.555 ms) : 0, 6555
Debugger [candidate] (6.548 ms) : 0, 6548
Remote Config [baseline] (700.726 µs) : 0, 701
Remote Config [candidate] (695.095 µs) : 0, 695
Telemetry [baseline] (9.398 ms) : 0, 9398
Telemetry [candidate] (9.542 ms) : 0, 9542
Flare Poller [baseline] (11.702 ms) : 0, 11702
Flare Poller [candidate] (10.947 ms) : 0, 10947
section iast
crashtracking [baseline] (1.489 ms) : 0, 1489
crashtracking [candidate] (1.486 ms) : 0, 1486
BytebuddyAgent [baseline] (816.557 ms) : 0, 816557
BytebuddyAgent [candidate] (816.567 ms) : 0, 816567
GlobalTracer [baseline] (231.763 ms) : 0, 231763
GlobalTracer [candidate] (232.836 ms) : 0, 232836
IAST [baseline] (26.323 ms) : 0, 26323
IAST [candidate] (26.713 ms) : 0, 26713
AppSec [baseline] (34.999 ms) : 0, 34999
AppSec [candidate] (35.38 ms) : 0, 35380
Debugger [baseline] (6.094 ms) : 0, 6094
Debugger [candidate] (6.207 ms) : 0, 6207
Remote Config [baseline] (614.057 µs) : 0, 614
Remote Config [candidate] (597.224 µs) : 0, 597
Telemetry [baseline] (8.678 ms) : 0, 8678
Telemetry [candidate] (8.787 ms) : 0, 8787
Flare Poller [baseline] (4.219 ms) : 0, 4219
Flare Poller [candidate] (4.282 ms) : 0, 4282

Load

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	raphael/memfd
git_commit_date	1760635008	1760636118
git_commit_sha	`c85d09f`	`fa05b36`
release_version	1.55.0-SNAPSHOT~c85d09f004	1.55.0-SNAPSHOT~fa05b36659

See matching parameters

	Baseline	Candidate
application	insecure-bank	insecure-bank
ci_job_date	1760637527	1760637527
ci_job_id	1183514614	1183514614
ci_pipeline_id	79539409	79539409
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-d7em4yj2 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-d7em4yj2 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 2 performance improvements and 1 performance regressions! Performance is the same for 9 metrics, 12 unstable metrics.

scenario	Δ mean http_req_duration	Δ mean throughput	candidate mean http_req_duration	candidate mean throughput	baseline mean http_req_duration	baseline mean throughput
scenario:load:insecure-bank:iast:high_load	better [-1154.395µs; -782.723µs] or [-10.957%; -7.429%]	unstable [-8.687op/s; +97.375op/s] or [-1.969%; +22.076%]	9.567ms	485.438op/s	10.536ms	441.094op/s
scenario:load:insecure-bank:profiling:high_load	better [-623.306µs; -308.212µs] or [-6.766%; -3.346%]	unstable [-41.402op/s; +94.839op/s] or [-8.219%; +18.828%]	8.746ms	530.438op/s	9.212ms	503.719op/s
scenario:load:petclinic:profiling:high_load	worse [+1.183ms; +2.232ms] or [+2.485%; +4.690%]	unstable [-10.249op/s; +3.424op/s] or [-10.421%; +3.481%]	49.301ms	94.938op/s	47.593ms	98.350op/s

Request duration reports for insecure-bank

gantt
    title insecure-bank - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (4.177 ms) : 4127, 4227
.   : milestone, 4177,
iast (10.536 ms) : 10351, 10720
.   : milestone, 10536,
iast_FULL (14.526 ms) : 14234, 14818
.   : milestone, 14526,
iast_GLOBAL (10.528 ms) : 10338, 10717
.   : milestone, 10528,
profiling (9.212 ms) : 9056, 9368
.   : milestone, 9212,
tracing (7.73 ms) : 7614, 7845
.   : milestone, 7730,
section candidate
no_agent (4.158 ms) : 4108, 4207
.   : milestone, 4158,
iast (9.567 ms) : 9407, 9727
.   : milestone, 9567,
iast_FULL (14.303 ms) : 14022, 14585
.   : milestone, 14303,
iast_GLOBAL (10.822 ms) : 10628, 11015
.   : milestone, 10822,
profiling (8.746 ms) : 8610, 8882
.   : milestone, 8746,
tracing (7.557 ms) : 7442, 7672
.   : milestone, 7557,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	4.177 ms [4.127 ms, 4.227 ms]	-
iast	10.536 ms [10.351 ms, 10.72 ms]	6.359 ms (152.2%)
iast_FULL	14.526 ms [14.234 ms, 14.818 ms]	10.349 ms (247.8%)
iast_GLOBAL	10.528 ms [10.338 ms, 10.717 ms]	6.351 ms (152.0%)
profiling	9.212 ms [9.056 ms, 9.368 ms]	5.035 ms (120.5%)
tracing	7.73 ms [7.614 ms, 7.845 ms]	3.553 ms (85.1%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	4.158 ms [4.108 ms, 4.207 ms]	-
iast	9.567 ms [9.407 ms, 9.727 ms]	5.409 ms (130.1%)
iast_FULL	14.303 ms [14.022 ms, 14.585 ms]	10.146 ms (244.0%)
iast_GLOBAL	10.822 ms [10.628 ms, 11.015 ms]	6.664 ms (160.3%)
profiling	8.746 ms [8.61 ms, 8.882 ms]	4.588 ms (110.4%)
tracing	7.557 ms [7.442 ms, 7.672 ms]	3.399 ms (81.8%)

Request duration reports for petclinic

gantt
    title petclinic - request duration [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (37.264 ms) : 36962, 37566
.   : milestone, 37264,
appsec (48.457 ms) : 48010, 48904
.   : milestone, 48457,
code_origins (45.474 ms) : 45069, 45878
.   : milestone, 45474,
iast (46.059 ms) : 45666, 46452
.   : milestone, 46059,
profiling (47.593 ms) : 47128, 48059
.   : milestone, 47593,
tracing (45.169 ms) : 44781, 45557
.   : milestone, 45169,
section candidate
no_agent (36.541 ms) : 36242, 36840
.   : milestone, 36541,
appsec (48.846 ms) : 48409, 49282
.   : milestone, 48846,
code_origins (44.303 ms) : 43925, 44681
.   : milestone, 44303,
iast (44.937 ms) : 44556, 45318
.   : milestone, 44937,
profiling (49.301 ms) : 48792, 49810
.   : milestone, 49301,
tracing (44.695 ms) : 44316, 45075
.   : milestone, 44695,

baseline results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	37.264 ms [36.962 ms, 37.566 ms]	-
appsec	48.457 ms [48.01 ms, 48.904 ms]	11.193 ms (30.0%)
code_origins	45.474 ms [45.069 ms, 45.878 ms]	8.209 ms (22.0%)
iast	46.059 ms [45.666 ms, 46.452 ms]	8.795 ms (23.6%)
profiling	47.593 ms [47.128 ms, 48.059 ms]	10.329 ms (27.7%)
tracing	45.169 ms [44.781 ms, 45.557 ms]	7.905 ms (21.2%)

candidate results

Variant	Request duration [CI 0.99]	Δ no_agent
no_agent	36.541 ms [36.242 ms, 36.84 ms]	-
appsec	48.846 ms [48.409 ms, 49.282 ms]	12.304 ms (33.7%)
code_origins	44.303 ms [43.925 ms, 44.681 ms]	7.762 ms (21.2%)
iast	44.937 ms [44.556 ms, 45.318 ms]	8.396 ms (23.0%)
profiling	49.301 ms [48.792 ms, 49.81 ms]	12.76 ms (34.9%)
tracing	44.695 ms [44.316 ms, 45.075 ms]	8.154 ms (22.3%)

Dacapo

Parameters

	Baseline	Candidate
baseline_or_candidate	baseline	candidate
git_branch	master	raphael/memfd
git_commit_date	1760635008	1760636118
git_commit_sha	`c85d09f`	`fa05b36`
release_version	1.55.0-SNAPSHOT~c85d09f004	1.55.0-SNAPSHOT~fa05b36659

See matching parameters

	Baseline	Candidate
application	biojava	biojava
ci_job_date	1760638067	1760638067
ci_job_id	1183514615	1183514615
ci_pipeline_id	79539409	79539409
cpu_model	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
kernel_version	Linux runner-zfyrx7zua-project-304-concurrent-0-jfw5wngz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux	Linux runner-zfyrx7zua-project-304-concurrent-0-jfw5wngz 6.8.0-1031-aws #33~22.04.1-Ubuntu SMP Thu Jun 26 14:22:30 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 11 metrics, 1 unstable metrics.

Execution time for tomcat

gantt
    title tomcat - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (1.476 ms) : 1465, 1488
.   : milestone, 1476,
appsec (3.685 ms) : 3468, 3902
.   : milestone, 3685,
iast (2.21 ms) : 2147, 2274
.   : milestone, 2210,
iast_GLOBAL (2.254 ms) : 2190, 2318
.   : milestone, 2254,
profiling (2.052 ms) : 2000, 2103
.   : milestone, 2052,
tracing (2.032 ms) : 1982, 2082
.   : milestone, 2032,
section candidate
no_agent (1.473 ms) : 1462, 1485
.   : milestone, 1473,
appsec (3.693 ms) : 3475, 3911
.   : milestone, 3693,
iast (2.199 ms) : 2135, 2262
.   : milestone, 2199,
iast_GLOBAL (2.242 ms) : 2178, 2306
.   : milestone, 2242,
profiling (2.047 ms) : 1996, 2098
.   : milestone, 2047,
tracing (2.016 ms) : 1967, 2065
.   : milestone, 2016,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.476 ms [1.465 ms, 1.488 ms]	-
appsec	3.685 ms [3.468 ms, 3.902 ms]	2.209 ms (149.6%)
iast	2.21 ms [2.147 ms, 2.274 ms]	734.075 µs (49.7%)
iast_GLOBAL	2.254 ms [2.19 ms, 2.318 ms]	777.759 µs (52.7%)
profiling	2.052 ms [2.0 ms, 2.103 ms]	575.539 µs (39.0%)
tracing	2.032 ms [1.982 ms, 2.082 ms]	555.937 µs (37.7%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	1.473 ms [1.462 ms, 1.485 ms]	-
appsec	3.693 ms [3.475 ms, 3.911 ms]	2.22 ms (150.7%)
iast	2.199 ms [2.135 ms, 2.262 ms]	725.459 µs (49.2%)
iast_GLOBAL	2.242 ms [2.178 ms, 2.306 ms]	768.986 µs (52.2%)
profiling	2.047 ms [1.996 ms, 2.098 ms]	574.127 µs (39.0%)
tracing	2.016 ms [1.967 ms, 2.065 ms]	543.356 µs (36.9%)

Execution time for biojava

gantt
    title biojava - execution time [CI 0.99] : candidate=1.55.0-SNAPSHOT~fa05b36659, baseline=1.55.0-SNAPSHOT~c85d09f004
    dateFormat X
    axisFormat %s
section baseline
no_agent (14.945 s) : 14945000, 14945000
.   : milestone, 14945000,
appsec (15.037 s) : 15037000, 15037000
.   : milestone, 15037000,
iast (18.606 s) : 18606000, 18606000
.   : milestone, 18606000,
iast_GLOBAL (18.166 s) : 18166000, 18166000
.   : milestone, 18166000,
profiling (15.626 s) : 15626000, 15626000
.   : milestone, 15626000,
tracing (15.139 s) : 15139000, 15139000
.   : milestone, 15139000,
section candidate
no_agent (14.913 s) : 14913000, 14913000
.   : milestone, 14913000,
appsec (14.875 s) : 14875000, 14875000
.   : milestone, 14875000,
iast (18.703 s) : 18703000, 18703000
.   : milestone, 18703000,
iast_GLOBAL (18.047 s) : 18047000, 18047000
.   : milestone, 18047000,
profiling (15.263 s) : 15263000, 15263000
.   : milestone, 15263000,
tracing (15.07 s) : 15070000, 15070000
.   : milestone, 15070000,

baseline results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.945 s [14.945 s, 14.945 s]	-
appsec	15.037 s [15.037 s, 15.037 s]	92.0 ms (0.6%)
iast	18.606 s [18.606 s, 18.606 s]	3.661 s (24.5%)
iast_GLOBAL	18.166 s [18.166 s, 18.166 s]	3.221 s (21.6%)
profiling	15.626 s [15.626 s, 15.626 s]	681.0 ms (4.6%)
tracing	15.139 s [15.139 s, 15.139 s]	194.0 ms (1.3%)

candidate results

Variant	Execution Time [CI 0.99]	Δ no_agent
no_agent	14.913 s [14.913 s, 14.913 s]	-
appsec	14.875 s [14.875 s, 14.875 s]	-38.0 ms (-0.3%)
iast	18.703 s [18.703 s, 18.703 s]	3.79 s (25.4%)
iast_GLOBAL	18.047 s [18.047 s, 18.047 s]	3.134 s (21.0%)
profiling	15.263 s [15.263 s, 15.263 s]	350.0 ms (2.3%)
tracing	15.07 s [15.07 s, 15.07 s]	157.0 ms (1.1%)

dd-trace-core/src/main/java/datadog/trace/core/CoreTracer.java

dd-trace-core/src/main/java/datadog/trace/core/ServiceDiscovery.java

bm1549 · 2025-10-09T17:50:37Z

@raphaelgavache iiuc GraalVM and Spring Native are not expected to work because of the way native libraries work with native image. So long as it won't crash or cause other adverse behavior in those scenarios, I think we're good

dougqh · 2025-10-09T18:01:30Z

@raphaelgavache iiuc GraalVM and Spring Native are not expected to work because of the way native libraries work with native image. So long as it won't crash or cause other adverse behavior in those scenarios, I think we're good

Yes, Graal native image support is awhile off for any native library usage.
In this case, the problem is JNA itself because JNA's dynamic class generation doesn't work well with the AoT approach of Graal native image.

dougqh · 2025-10-09T20:19:35Z

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscovery.java

+public class ServiceDiscovery {
+  private static final Logger log = LoggerFactory.getLogger(ServiceDiscovery.class);
+
+  private static final byte[] SCHEMA_VERSION = "schema_version".getBytes(ISO_8859_1);


For this particular case, I'd rather not have the constants.
The problem is that they end up permanently consuming memory when we really only intend to use them once.

Admittedly, this is a weird case and it has got me thinking about whether we want to just unload the class, but that's on platform to figure out.

dougqh · 2025-10-09T20:22:33Z

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscovery.java

+            TracerVersion.TRACER_VERSION,
+            config.getHostName(),
+            config.getRuntimeId(),
+            config.getServiceName(),


This is using the statically configured service name which isn't necessarily the service name that the tracer will use in the end. Are we okay with that?

We might also want to consider moving this to a background task, so it doesn't impact start-up.

dougqh · 2025-10-09T20:30:33Z

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscovery.java

+    mapElements += (processTags != null && processTags.length() > 0) ? 1 : 0;
+    mapElements += (containerID != null && !containerID.isEmpty()) ? 1 : 0;
+
+    SimpleUtf8Cache encodingCache = new SimpleUtf8Cache(256);


I would skip the cache in this case.
Since the code is creating a new cache each time, the cache isn't providing a benefit here. And the cache is actually slower than just doing the encoding directly, the cache is helpful in rducing allocation if we're repeatedly encoding again and again.

But since this code is only called a handful of times, there's not much chance to save on allocation.

amarziali · 2025-10-10T17:19:04Z

I think that the write should be done asynchronously (not on the premain) since the startup time skyrocketed. Ideally also the related classloading can be deferred on a scheduled task to be done after the tracer started

github-actions · 2025-10-14T12:52:17Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscovery.java

PerfectSlayer · 2025-10-15T12:39:07Z

dd-trace-api/src/main/java/datadog/trace/api/ConfigDefaults.java

  static final boolean DEFAULT_TELEMETRY_LOG_COLLECTION_ENABLED = true;
  static final int DEFAULT_TELEMETRY_DEPENDENCY_RESOLUTION_QUEUE_SIZE = 100000;

+  static final boolean DEFAULT_SERVICE_DISCOVERY_ENABLED = true;


❔ question: ‏Are there some other products doing JNA at startup by default?

I don't know about the JNA part but the memfd is enabled by default on all tracers except java for multiple months now. Agent products consume it

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ForeignMemoryWriter.java

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscoveryFactory.java

PerfectSlayer · 2025-10-15T12:40:56Z

dd-trace-core/src/main/java/datadog/trace/core/CoreTracer.java

        tagInterceptor,
        strictTraceWrites,
        instrumentationGateway,
+        null, // you might refactor this as well


❔ question: ‏left over? Have a noop instead?

just removed the comment, I'm not sure about the difference between null or noop class and where's a good example to pick from,
so I stayed with null, but can revisit it

NoOp class will probably be the better option. Especially if you use classes rather than interfaces.
With classes, class hierarchy analysis optimizations will kick-in, so after JIT-ing, you still get a direct call with no type checks before the call.

But the more important thing is just to make the whole thing async

it's async now since this change

PerfectSlayer

📝 notes: I just realised I did not submit the main comment about my review 😅
Cleaning up the initialization was one of the reasons to use a Supplier in the first place

After Doug comment, serviceDiscoveryFactory() should even return the NOOP rather than null.

PerfectSlayer · 2025-10-15T15:50:50Z

dd-java-agent/agent-tooling/src/main/java/datadog/trace/agent/tooling/TracerInstaller.java

+
+        maybeEnableServiceDiscovery(tracerBuilder);
+
+        installGlobalTracer(tracerBuilder.build());


Suggested change

installGlobalTracer(tracerBuilder.build());

.pollForTracingConfiguration()

.serviceDiscoveryFactory(TracerInstaller::serviceDiscoveryFactory)

.build();

installGlobalTracer(tracer);

where I would directly use a method reference to create the MemFDUnixWriter which can return null if the feature is disabled or unavailable.

@SuppressForbidden // intentional use of Class.forName private static ServiceDiscoveryFactory serviceDiscoveryFactory() { if (!Config.get().isServiceDiscoveryEnabled()) { return null; } if (!OperatingSystem.isLinux()) { log.debug("service discovery not supported outside linux"); return null; } // make sure this branch is not considered possible for graalvm artifact if (Platform.isNativeImageBuilder() || Platform.isNativeImage()) { log.debug("service discovery not supported on native images"); return null; } try { // use reflection to load MemFDUnixWriter so it doesn't get picked up when we // transitively look for all tracer class dependencies to install in GraalVM via // VMRuntimeInstrumentation Class<?> memFdClass = Class.forName("datadog.trace.agent.tooling.servicediscovery.MemFDUnixWriter"); ForeignMemoryWriter memFd = (ForeignMemoryWriter) memFdClass.getConstructor().newInstance(); return new ServiceDiscovery(memFd); } catch (Throwable e) { log.debug("service discovery not supported", e); return null; } }

I'm not familiar enough with factory patterns and can't get this to compile I'm sorry, would it be possible for you to commit directly what you have in mind here please?

Sure, will do later this week 👌

Awesome, thanks!

What's the benefit of introducing a method reference and always calling serviceDiscoveryFactory?

I could imagine you might want to defer all service-discovery logic, but IMHO the current approach is more readable and skips setting any factory when we know up-front that it's not possible or applicable.

Also note that if you do set a serviceDiscoveryFactory then that will always trigger a call in CoreTracer to schedule an async task via AgentTaskScheduler to evaluate that factory. On platforms where we know up-front that service discovery is not possible/applicable this will lead to an unnecessary call to schedule a task which will do nothing.

Given this I would leave this code as it is today.

amarziali

Feature wise looks good to me. The implementation is not affecting performance indicators and can be opted-out.
@raphaelgavache can you please add deactivation instructions to the PR description?
Also, there are rooms for improvements raised by @PerfectSlayer that should be considered before merging this or addressed in a followup pr.

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscovery.java

mcculls · 2025-10-16T13:17:58Z

internal-api/src/main/java/datadog/trace/api/Config.java

    return instrumenterConfig.isTraceEnabled();
  }

+  public boolean isServiceDiscoveryEnabled() {


thanks for adding a feature-flag to control this!

dd-trace-api/src/main/java/datadog/trace/api/config/GeneralConfig.java

...gent-tooling/src/main/java/datadog/trace/agent/tooling/servicediscovery/MemFDUnixWriter.java

mcculls

LGTM, thanks for putting this together

Co-authored-by: Stuart McCulloch <[email protected]>

PerfectSlayer

Just pushed factory refactoring

PerfectSlayer · 2025-10-16T13:17:07Z

...gent-tooling/src/main/java/datadog/trace/agent/tooling/servicediscovery/MemFDUnixWriter.java

+
+    NativeLong written = libc.write(memFd, buf, new NativeLong(payload.length));
+    if (written.longValue() != payload.length) {
+      log.warn("write to memfd failed errno={}", Native.getLastError());


❔ question: ‏Should we clear the memfd if write failed?

it is safe to have it partially written and open, so I think the less libc interaction the better

raphaelgavache · 2025-10-16T17:31:32Z

...gent-tooling/src/main/java/datadog/trace/agent/tooling/servicediscovery/MemFDUnixWriter.java

+  public void write(byte[] payload) {
+    final LibC libc = Native.load("c", LibC.class);
+
+    int memFd = libc.memfd_create("datadog-tracer-info", MFD_CLOEXEC | MFD_ALLOW_SEALING);


just realised the agent implementation contrary to system tests matches on an additional -
adding it to the file name

raphaelgavache changed the title ~~try to plug memfd to core-tracer~~ support service discovery with JNA Oct 8, 2025

raphaelgavache force-pushed the raphael/memfd branch from f852d0d to a46f9e5 Compare October 8, 2025 20:51

raphaelgavache commented Oct 8, 2025

View reviewed changes

dd-trace-core/src/main/java/datadog/trace/core/CoreTracer.java Outdated Show resolved Hide resolved

raphaelgavache force-pushed the raphael/memfd branch 3 times, most recently from d8eb447 to 65ffa65 Compare October 8, 2025 22:09

mcculls reviewed Oct 9, 2025

View reviewed changes

dd-trace-core/src/main/java/datadog/trace/core/ServiceDiscovery.java Outdated Show resolved Hide resolved

mcculls reviewed Oct 9, 2025

View reviewed changes

dd-trace-core/src/main/java/datadog/trace/core/ServiceDiscovery.java Outdated Show resolved Hide resolved

dougqh reviewed Oct 9, 2025

View reviewed changes

mcculls force-pushed the raphael/memfd branch from f41114c to bdc4e71 Compare October 13, 2025 15:35

raphaelgavache marked this pull request as ready for review October 14, 2025 12:52

raphaelgavache requested a review from a team as a code owner October 14, 2025 12:52

raphaelgavache requested a review from amarziali October 14, 2025 12:52

raphaelgavache added comp: core Tracer core type: enhancement Enhancements and improvements labels Oct 14, 2025

raphaelgavache mentioned this pull request Oct 15, 2025

service discovery - enable java test DataDog/system-tests#5502

Merged

6 tasks

amarziali reviewed Oct 15, 2025

View reviewed changes

dd-trace-core/src/main/java/datadog/trace/core/servicediscovery/ServiceDiscovery.java Show resolved Hide resolved

raphaelgavache and others added 5 commits October 15, 2025 12:10

try to plug memfd to core-tracer

adb4fad

prepare encoding

d35a191

add proper encoding

8bb56e6

suggestions

e76f4f9

add groovy test

b20fefb

raphaelgavache and others added 4 commits October 15, 2025 12:10

match json spec for python decoder in system-tests

cd753d1

update after review

8113f76

add config to disable

221c556

Rearrange config

382c311

amarziali force-pushed the raphael/memfd branch from 924872b to 382c311 Compare October 15, 2025 10:10

PerfectSlayer reviewed Oct 15, 2025

View reviewed changes

raphaelgavache added 2 commits October 15, 2025 09:53

update after review

51626d3

Merge branch 'master' into raphael/memfd

2ec1874

PerfectSlayer reviewed Oct 15, 2025

View reviewed changes

amarziali approved these changes Oct 16, 2025

View reviewed changes